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Abstract 

We consider communication over a noisy network under randomized linear network coding. Possible 
error mechanism include node- or link- failures, Byzantine behavior of nodes, or an over-estimate of the 
network min-cut. Building on the work of Kotter and Kschischang, we introduce a probabilistic model 
for errors. We compute the capacity of this channel and we define an error-correction scheme based on 
random sparse graphs and a low-complexity decoding algorithm. By optimizing over the code degree 
profile, we show that this construction achieves the channel capacity in complexity which is jointly 
quadratic in the number of coded information bits and sublogarithmic in the error probability. 

1 Introduction 

Consider a wire-line communication network modeled as a directed acyclic (multi-)graph with edges of 
unit capacity. A sources wants to communicate information to a set of receivers. If we allow processing 
of information at nodes in the network then the achievable throughput is in general higher than what 
can be achieved by schemes that only allow routing [1,9]. Schemes that employ processing are referred 
to as network coding schemes. 

The standard assumption in the network coding literature is that no errors are introduced within 
the network or, equivalently, that sufficiently powerful error-correcting codes are employed on the links 
at the physical layer. However a number of error sources (e.g., malicious or malfunctioning nodes) 
cannot be neglected. We consider a probabilistic model for transmission errors that builds upon the 
work of Kschischang and Kotter [7,14]. We compute the information theoretic limit on point-to-point 
communication for this model (the channel capacity) and define a coding scheme based on a sparse- 
graph construction and a low-complexity iterative decoding algorithm. We show that the parameters 
of the construction can be optimized analytically and, remarkably, the optimized scheme achieves the 
channel capacity. This is the second channel model for which iterative schemes can be shown to achieve 
capacity (the first one being the binary erasure channel; this was shown in the seminal work of Luby, 
Mitzenmacher, ShokroUahi, Spielman, and Steman [10]). 



2 Network Coding: Background and Related Work 

Assume that an information source generates h symbols per unit time. The integer h is referred to as 
the source rate. Information is encoded at the sender in packets of length N with entries from a finite 
field ¥q. The network is assumed to be synchronous and without delay. As a consequence, packets are 
aligned at the destination at regular time intervals. 
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The most common scenario studied in this context is a multi-cast one in which the source aims 
at communicating the same information to a set of receivers (distinct nodes in the same network.) 
The fundamental theorem of network coding states that this is possible using network coding (i.e., 
processing at the nodes) if the values of the min-cuts from the source to any of the receivers is at least 
h [1]. Moreover, linear network coding suffices [9]. This means that processing at the nodes can be 
limited to forwarding packets which are linear (over ¥q) combinations of incoming packets. Finally, it is 
not necessary to choose the local encoding functions at the nodes carefully. Random linear combinations 
are sufficient with probability close to one, provided the cardinality of the field is large enough [6,8]. 
For a general introduction into network coding we refer the reader to [4, 16, 17]. 

The preferred method to implement random linear network codes is to include "headers" in the 
packets of length N [2] . The role of the headers is to "record" the coefficients used in the local encoding 
functions so that the receiver can be oblivious to the network topology and to the specific local encoding 
functions used. In more detail, assume that we send £ packets. The header of each packet is then an 
element of {^qY, where the header of the i-th source packet, i e [£], is the all- zero tuple, except for 
an identity element at position i. Recall that nodes forward packets which are linear combinations of 
the incoming packets. Therefore, if the header of a packet somewhere in the network reads (/3i, . . . , Pi), 
Pi ¥q, then we know that this packet is the linear combination of the £ original source packets, where 
the i-th original source packet has "weight" Pi. The significant advantage of such a scheme is that the 
receivers can be oblivious to the topology and the local encoding functions. Of course we pay some price; 
if we use headers then only m — N — £ oi the N symbols of each packet are available for information 
transmission. Our subsequent discussion assumes this "oblivious" model. 

So far we assumed that errors neither occur during transmission nor during processing. If the channel 
or the processing are noisy, one can use coding to combat the noise. Note that if we stack the £ source 
packets of length N on top of each other then we get an ^ x matrix over whose, lets say, left £ x £ 
submatrix (the collection of headers) is the identity matrix. 

Formally, a code £ is a collection oi £ x N matrices with elements in Fg, such that each Af G £ 
takes the form M = [1 1 x] . Here, 1_ is the £ x £ identity matrix and xisa.n£xm matrix {m = N — £). 
We say that M is in normal form. The code £ is thus equivalently described by a collection of ^ x m 
matrices {x}. The rate of the code is defined as the ratio of the number of information (/-bits that can 
be conveyed by the choice of codeword (log^ |£|) to the number of transmitted symbols {N£): 

Before the source packets are transmitted we multiply M from the left by an £ x £ random invertible 
matrix with components in F^. This "mixes" the rows of M and ensures that regardless of the network 
topology and the location where the errors are introduced, the effect of the errors on the normalized 
form is uniform. We then transmit each resulting row as one packet. 

Upon transmission of M, a "corrupted" version Q of the codeword is received. Without loss of 
generality, we assume that Q is brought back into normal form Q = [1| y] by Gaussian eliminationl^ 
Following Kotter and Kschischang [7], we model the net effect of the transmission- and the processing- 
"noise" as a low-rank perturbation of a;. More precisely, we assume that 

y = x + z, (2) 

where z is an £ x to matrix over F, of rank(z) — £oj, u £ [0, 1]. We call £uj the weight of the error, and 
Lo the normalized weight. 

Define the distance of two codewords x and x' as d{x,x') = rank(x — x') and the minimum distance 
d{€) of the code £ as the minimum of the distances between all distinct pairs of codewords. The 
normalized minimum distance is (5(£) ~ d(€)/£. It is shown in [7] that d{-, •) is a true distance metric; 



^ In principle it might be that the received matrix cannot be brought in this form because its first £ columns have rank 
smaller than £. However, within the probabilistic model which we discuss in the following, the rank deficiency is small with 
high probability and can be eliminated by a small perturbation. 
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in particular it fulfills the triangle inequality. Therefore, given a code £ of minimum distance d{<t) a 
simple bounded distance decoder can correct all errors of weight s — {d{(L) — l)/2 or less. A bounded 
distance decoder is an algorithm that, given a received word y, decodes y to the unique word within 
distance s if such a word exists and declares an error otherwise. Bounded distance decoders are popular 
since a suitable algebraic structure on the code often ensures that bounded distance decoding can be 
accomplished with low complexity. 

The bounded- distance error-correcting capability of a code is defined as — d{€)/{2£) = S{€.)/2. 
Kotter and Kschischang showed that the optimal trade-off between and uj{<t} is given by an 

appropriate generalization of the "Singleton bound." In the limit N — > oo, with £ — XN, the maximal 
achievable rate for the parameters A e [0, 1/2], call it Csingieton(A, w), is given by 

Csi„gioto„(A,cj) = (l-A)(l-2cj). (3) 

Note that Csingieton(A, is the maximum achievable rate for a guaranteed error correction in an ad- 
versarial channel model. It is also the maximal achievable rate in a probabilistic setting if we are 
limited to bounded distance decoding. Remarkably, Kotter and Kschischang found a generalization of 
Reed-Solomon codes that achieves this bound. 

3 Main Results 

We are interested in a probabilistic (as opposed to adversarial) channel model. More precisely, we assume 
that in ([2|) the perturbation z is chosen uniformly at random from all matrices in (Fg)^^™ of rank £ui. 
We assume that the parameters A and io are fixed and consider the behavior of the channel as we increase 
N. We refer to our channel model as the symmetric network coding channel with parameters A and uj, 
denoted by SNC(A,w). 

Proposition 3.1 (Channel Capacity). The capacity o/SNC(A,(jj) is 

C{\,uj) = 1- X-UJ + Xuj^. (4) 

Discussion: In the definition of capacity we implicitly assume that the error probability lu is not a 
function of N. Depending on the underlying physical error mechanism this may or may not be the case. 
Note that for small oj, C(A, w) « 1 — X — uj, whereas Csingicton(A, w) « 1 — A — 2(1 — X)u}. Fig. [1] compares 
C(A,a;) with Csingicton(A, tj) and shows the points that are achievable according to Theorem 13.21 

Theorem 3.2 (Capacity- Achieving Iterative Code Construction). For any X,uj E (0, 1) such that (1 — 
A)/ A is an integer multiple ofw, any R < C{X,uj), and any tt > there exists an error correcting code 
and a decoding algorithm that achieves symbol error probability smaller than n, with 0{N* loglog(l/7r)) 
decoding complexity. 
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Figure 1: Comparison of C{X,uj) (solid line) with Csingieton(-^) (dotted line) for A = 1/6. The points on 
the curve C{X,uj) that are achievable by the low-complexity iterative scheme are shown as dots. 
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Discussion: The complexity of the scheme is given as O(iV^). But note that the number of transmitted 
information symbols is N^XR. Therefore, if we measure the complexity per transmitted information 
symbol then it is only quadratic. 

Note also that the complexity scales much better with the target error probability than for usual 
sparse graph codes (where it is at least linear in log(l/7r)). 

3.1 Code Construction 

Fig.[l]shows our coding scheme. Each row corresponds to a packet of length N. The £x£ identity matrix 
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Figure 2: Coding scheme. The last u>£ rows of x are zero. The first (1 — uj)£ obey a set of linear constraints 
represented by the bipartite graph shown on the right-hand side. 



1 is shown on the left-hand side, whereas the £ x m matrix on the right-hand side represents x. Each x 
corresponds to a codeword of £. Not all x are allowed. Here are the constraints that x must fulfill to 
be a codeword. The bottom uj£ rows are identical to zero. The top (1 — u!)£ rows are constrained by a 
linear system of equations. These are indicated by the bipartite graph on the right-hand side, according 
to the standard graphical representation used for low-density parity-check codes [5, 13]. More precisely, 
we have 



/ 



0. 



(5) 



The matrix IH has the following structure. Start with a "sparse" ((1 — uj)£r) x ((1 — u!)£) {0, l}-valued 
matrix H. The matrix IH has exactly 2 non-zero entries along each column. Further, the fraction of rows 
that contain exactly i non-zero entries is equal to Pi, where P{x) ~ PiX^ is a given degree distribution 
(in particular, it fulfills Pi > and P{1) — 1.) In the following, we shall say that P has bounded support 
if Pi = for i larger than some nmax < oo or, equivalently, if P{x) is a polynomial. 

The matrix IH is represented by the graph. Circles (on the right-hand side in Fig. [21) correspond to 
the columns of IH and squares (on the left-hand side) correspond to the rows of H. There is an edge 
between a circle and an edge iff there is a non-zero entry at the corresponding row and column of H. 
Following the iterative coding literature, we refer to the circles as the variable nodes, to the squares as 
the check nodes, and we call this graph a Tanner graph. To get the matrix IH we "lift" IH by replacing 
each of its non-zero elements by an m x m invertible matrix with elements in ¥q. We can visualize this 
by attaching these invertible matrices as labels to the corresponding edges. 

We claim that for any choice of the matrix IH compatible with the degree distribution P{x) the rate 
of the code is at least 



i?(c.,A,P)-(l-A)(l-c.)(l--^) 



(6) 



To see this, note that the matrix x is of dimension £ x m and has entries in fg. Since the last lu£ 
rows have to be zero this reduces the degrees of freedom by muj£. Further, there are m(l — w)^-p^Yy 
linear constraints, taking away at most that many further degrees of freedom (and possibly less because 
of linear dependencies). We get the claim by dividing the remaining degrees of freedom by N£, in 
accordance with ([T]). 
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So far we have explained how to construct a code. We define an ensemble of codes by (i) picking a 
matrix IH uniformly from all matrices that have degree profile P{x) according to the configuration model 
and, (ii) picking the labels (the m x m invertible matrices) for all edges uniformly and independently 
for each edge. We denote the resulting ensemble by C{N, A, to, P{x)). 

Discussion: In Fig. [2] all linear constraints are on the rows of x. An entirely equivalent formulation is 
to apply the linear constraints to the columns of x instead; i.e., set the last ujm columns of x to zero and 
apply a set of linear constraints on the first (1— a;)m columns of ^. All subsequent statements apply also 
to this case and yield identical results if we let A'^ tend to infinity. For the sake of simplicity, we limit 
our discussion to the scheme of Fig. [2l In a practical implementation, however, there can be reasons to 
prefer one scheme over the other. For instance, the iterative decoder discussed in the next section might 
be more effective on a larger Tanner graphs. This suggests to use the construction in Fig. [2] if £ > m 
and the 'transposed' one otherwise. 

3.2 Encoding and Decoding Algorithm 

Assume that the parameters of the model (A^, A, u, and P{x)) are fixed and that we have chosen 
one particular code from the ensemble C{N, X,u!, P{x)). At the source we are given RN£ symbols over 
¥q (the information we want to transmit). We need to map each of these q^^^ possible information 
vectors to a distinct codeword x. This is the encoding task. In principle this can be done by solving a 
linear system of equations, starting with ([5]). A brute force approach, however, has complexity 0{N^). 
Fortunately, one can exploit the sparseness of the matrix IH to reduce the encoding complexity to 0{N^). 
The basic idea is to bring IH into upper-triangular form by using only row and column permutations but 
no algebraic operations. As proved in [12], this can be done with high probability if P"(l)/P'(l) > 1. 
We will see in Section SI cf. Lemma 14.51 that this condition is always fulfilled. Further details on the 
efficient implementation of the encoder will be discussed in a forthcoming publication. We are currently 
mainly concerned with the decoding problem. 

The receiver sees the perturbed matrix y. An equivalent description of our channel model is the 
following. Each row of y is the result of adding to the corresponding row of x a uniformly random 
element of a subspace W of (Fq)™. The subspace W is itself uniformly random under the condition 
dim {W) = ujIE 

Recall that by assumption the last iuj rows of x are zero. In fact, in order to achieve reliable 
transmission we need to modify the scheme described so far and set the last £uj' rows of x to 0, where 
uj' > CO is arbitrarily close to uj. This modification reduces the rate by a quantity that can be made 
arbitrarily small. Since the perturbation has dimension iu, the last iuj' rows of y will span W with high 
probability as A^ — > oo. A basis of W is then obtained by reducing these rows via gaussian elimination. 

We therefore assume hereafter that W is known and, to avoid cumbersome notation, we set oj' = u. 
The decoding task consists in finding the perturbations for the first (1 — lo)£ rows of y. If we subtract 
these perturbations form y, we have found x. Throughout the description, given two sets of vectors Ui 
and J72, we let Ui + U2 = {ui + U2 ■ mi G Ui, U2 G U2} and, for a given vector x, x + U = {x} + U. 
Finally, given a matrix () G (Fg)™^™, J7f) = : u £ U} (vectors are always thought as row vectors). 

We proceed in an iterative fashion. The basic principle is easily understood. We know that Xi S 
yi + W. In words, we know that Xi lies in a given affine subspace. Consider a check node a and, without 
loss of generality, let its neighbors be 1, . . . ,d. Let t)ia, « = 1, - ■ ■ ,d, denote the corresponding edge 
labels. As we discussed earlier, each such edge label is an m x m invertible matrix with entries in F^. 

By the definition of the code, X^iLi 2;jf)i,a = 0. In particular, this means that xi e {^i=2^i^i,a.)^ia- 
Since we know that Xi 6 yt + W, this implies that 

xi e [{y2 + W)i)2,a + --- + {yd + W)t)d,a]i)i], ■ 



^As discussed in the introduction, the underlying physical process is the following: we add the headers to the rows of x; we 
scramble the rows of M multiplying it by a random invertible matrix in (Fg)""^™; we send the resulting packets; the channel 
perturbs these packets; the receiver collects the perturbed packets, stacks them up to a matrix Q, brings the matrix back into 
normal form, and "strips off" the headers. 
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Since we also know that xi £ yi + W, this imphcs that 

XI e (yi +W)n {[{y2 + W)t)2,a + • ■ • + (2/d + W)[)d.a]hi,l} ■ (7) 

The actual decoder is most conveniently described (and analyzed) as a 'message passing' algorithm, 
with messages being sent along the edges of the Tanner graph. Messages are affine subspace of (Fq)™. 
They are sent in rounds. First we send messages from the variable nodes to the check nodes. We process 
the incoming messages at the check nodes and then send messages on all edges from the check nodes to 
the variable nodes. This concludes one iteration of message passing. 

In more detail, the message sent from variable node i to check node a in the t-th iteration is an afhne 
subspace W^!^^^ of (Fq)"*. If variable node i is connected to check node a, let a denote the second check 
node that is connected to i (recall that each variable node has exactly two neighbors). Variable nodes 
do not perform any non-trivial processing of the messages, and check-to- variable node messages coincide 
with variable-to-check ones wi_^„ = W^i*l, . 



For t = we have W^^^ — yi + W for all variable nodes i and all check nodes a. Further, let da 
denote all neighbors of a check node a. According to the above discussion, we apply for t > the 
recursion 



j£da\i 

If, after some iterations, dim (W^^^^ fl wj^^^) = 0, then have determined the z-th row of namely 

Our (main) Theorem 14.51 affirms that, for given parameters A and uj, the degree distribution P{x) 
can be chosen in such a way that the rate of the overall code approaches the capacity arbitrarily closely 
and that the decoder succeeds with high probability when the packet size tends to infinity. 



4 Proofs 

In the next section we state a few auxiliary lemmas on the behavior of the message-passing decoder and 
prove Theorem 13.21 The lemmas are then proved in Section 14.21 Finally, the capacity of the network 
coding channel is computed in Section [4.31 



4.1 Auxiliary Results and Proof of the Main Theorem 

To start we can simplify our proof in two manners. First, by symmetry of the channel and the message- 
passing rules, we can assume that the all-zero matrix x was transmitted and we need only analyze the 
behavior of the decoder for this case. Notice that, under this assumption, the messages wj-^^ are linear 



subspaces (as they must contain the transmitted vectors Xi = 0.) Second, as we discussed in Section [3. 2|, 
the first step of the decoding procedure consists of learning the perturbing subspace W. Because of the 
special structure of the matrix x (the last lu£ rows are zero) this is accomplished by a simple inspection. 
We therefore assume in all that follows that W is known and that the all-zero matrix was transmitted. 

Throughout this section we let P be a distribution over the integers and let G be a random multi- 
graph over £{l—r) nodes with degree distribution P. The graph G is drawn according to the configuration 
model and the code is constructed from G as described in the previous section. Since variable nodes 
have degree 2, we can think of G either as a multi-graph over the check nodes, or as a bipartite graph 
over check and variable nodes. 

It is also useful to define the 'edge perspective' degree distribution 



(9) 



For a uniformly random edge in G, let VF'-*' be the associated message (that, we recall, is an affine 
subspace in (Fq)™). The key step in the analysis is to notice that the dimension of satisfies a simple 
recursion. 
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First consider n — 1 independent and uniformly random linear subspaces Vi, . . . ,Vn~i C (Fq)™ of 
dimensions di, . . . , respectively. Let V he a, fixed subspace of dim {V) — D, and define 

K^^^d I di, . . . , dn-i) = P{dim (1/ n (Fi + • ■ • + 14-i)) - d}, . (10) 

The probability kernel k'^'^j-j admits an explicit albeit cumbersome expression in terms of Gauss poly- 
nomials. Fortunately, we do not need its exact description in the following. 

We define a sequence of integer- valued random variables {-D^*-'}t>o recursively as follows. For t — 
we let Z?'"^ = Hlo identically. For t>Q, choose n with distribution p„, and draw Zjf \ . . . , o'^n-i copies 

of D^*\ Then, the probability of = d, conditioned on the values Z?^*'' = di, . . . ,-D,^j*li = dn-i 

coincides with Eq. PH]) where D — ioj. In formulae, 

P{Z?(*+i) = d} = ^ p„ J2 i^,<^','^(d|di,...,d„_i)P{Z^(*) =di}---P{i^(*) =d„-i}. (11) 

n>l di...dn-i 

The sequence {D^*^} accurately tracks the dimension of H/'*^ as stated below. 

Lemma 4.1 (Density Evolution on a Graph versus Density Evolution on a Tree). For any degree 
distribution P with bounded support ( i. e. such that P„ = for all n large enough ) and any i £ N there 
exists a sequence e{i,t) with e{£,t) ]. as £ ^ oo, such that, for any m, and I, 

||P{dim(VKW) e ■}-V{D^'^^ -IIItv <e(^,i), (12) 

where we recall that ||Px - Py||Tv = sup^ \ V{X £ A) - V{Y ^ A)\. 

Controlling the sequence of random variables {-D'*''}t>o is quite difficult. Luckily, its behavior sim- 
plifies considerably if we let m — > co and consider the scaled dimensions D^^^ I{IlS). 

More precisely, we define the sequence of random variables {^^*-'}t>o with values in [0, 1] recursively 
as follows. We let ^'"^ — 1 identically. For any t > 0, let n be drawn with distribution p„, and 

1 ■ ■ . be iid copies of Further, for a,b,x R with a < b, define [x]^ = min(max(x, a), 5). 

Then, the distribution of is given by 



(t+i) A 



(13) 





We will prove that the rescaled dimensions D'^^^ /{iuj) are accurately tracked by (}-^\ 

Lemma 4.2 (Density Evolution versus Rescaled Density Evolution). For any rimax, ^, o,nd A there 
exists £ > such that, for any degree distribution P with support in [0,nniax]' 

lim P{i:)('+i) > 0} < n„iax P{^^*^ > e} . (14) 

m — ^oo 

The previous lemma shows that it suffices to consider the behavior of for which we have the 
explicit simple recursion (jl3p . Even so, finding a degree distribution p which results in codes of large 
rates and so that converges to for large values of S, seems challenging. The key to our analysis is 
the observation that the recursion simplifies significantly if (1 — A)/A is an integer multiple of oj. In 
this case the distribution of trivializes: only takes on the values or 1 regardless of the degree 
distribution p. Density evolution therefore collapses to a scalar recursion, making it possible to find the 
optimum degree distribution p. 

Lemma 4.3 (Capacity Achieving Degree Distributions for Rescaled Density Evolution). Let A, £ (0, 1) 
be such that (1 — A)/A is an integer multiple of oj and let r < C(A, a;)/((l — A)(l — cj)). Then there exists 
p with bounded support and 1 — 2 p{x)dx > r, and two constants ^ > 0, 7 > 1 such that, for any t, 
e>0, 

P{^^*^ > e} < exp{-yl7*} . (15) 
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Proof of the Main Theorem \3.SX Let X,uj,R be as in the statement of the theorem and r G (i?/((l — 
A)(l — uj)), C(A,ijj)/((l — A)(l — Lj)). We claim that there exists a degree distribution P with support 
in [0,nmax], with 1 — 2/P'(l) > r (equivalently, from the edge perspective, 1 — 2 p{x)dx > r) such 
that the iterative decoder achieves error probabiUty smaller than tt in 0(loglog(l/7r)) iterations. Let us 
check that this indeed proves the theorem. As mentioned above, the perturbation subspace W (i.e., the 
linear subspace of (Fg)"* spanned by the rows of z) can be inferred with high probability by the last w'f 
of the output y. This requires Gaussian elimination of an m x {(.uj') matrix with elements in F^, which 
can be accomplished at a cost of 0{N^) operations. 

The rest of the codeword x is decoded by message passing. Each iteration requires updating 0{N) 
messages (because P has bounded support). Each update, cf. Eq. ([8]), requires finding a basis for a space 
spanned by, at most (^ti-')n„iax vectors in (Fq)™. This can be done, again via Gaussian elimination, in 
0(N^) operations. We thus get 0{N'^) operations per iteration. Since running 0(loglog(l/7r)) iterations 
achieves error probability smaller than tt, this implies the thesis. 

Let us now prove this claim. First, we fix the degree distribution in such a way that Lemma 14.31 
holds for some A > 0, 7 > 1. We let U{Tr) = 0(loglog(l/7r)) be such that P{^(*) > e} < exp{-A7*} < 
7r/(3nniax) for any t > t*(7r). Then, for any fixed t > t*(7r) the decoded error probability is upper 
bounded by tt if TV is large enough. 

Indeed, e can be chosen in such a way that Lemma 14.21 holds and therefore, for m large enough, 
P{£)(t+i) > 0} < n„,ax7r/(3n„,ax) +7r/3 < 27r/3. The i-th row of codeword x is decoded correctly if 
any of the two messages Wj^^^'^ or Wj;^^'' has dimension 0. Therefore, the symbol error probability is 
upper bounded by P{dim (VF^*"'"^-') > 0}. By Lemma 1411 for £ large enough, this is at most P{£>'^*+^) > 
0} + 7r/3 < TT, which proves the theorem. □ 

4.2 Proofs of Lemmas 

Proof of Lemma \4-.1\ The proof is based on the 'density evolution' technique [13], and on some remarks 
that allow to simplify the resulting distributional recursion. A similar result appeared already in the 
context of erasure decoding for non-binary codes [11]: in order to be self-contained we nevertheless 
sketch the proof here. 

Let e be a uniformly random directed edge in G and let the associated message after t iterations 
of the message-passing algorithm. Denote by B(e, t) the 'directed neighborhood' of e with radius t, i.e., 
the induced sub-graph containing all non-reversing walks in G of length at most t that terminate in e. 
We regard this as a labeled graph with variable node labels given by the received vectors and edge labels 
by the m —> m matrices that define the code. It is well known that such a neighborhood converges to a 
(labeled) Galton- Watson tree T{t). 

More precisely, T{t) is a ^-generations tree rooted in a directed edge eV and with offspring distribution 
Pn- We have 

||P{B(e,t)e •}-P{T(i)e ■}\\Tv<eie,t), (16) 

for some e{£,t) as in the statement of Lemma \4A\ 

Note that the message is a function only of the neighborhood B{e,t). Suppose that we apply 
the message-passing algorithm to T(t) and let Wj'' be the message passed through the root edge after 
t iterations. It follows from the definition of total variation distance that 

|P{dim(M/(*)) e •} - P{dim(W^j*^) e -IIItv < e(^,i). (17) 

The proof is completed by showing that dim (Wy*'') is distributed as the random variable Z?^*-* defined 
recursively by Eq. pT|) . First, note that Wj"^ is a uniformly random subspace, conditional on its 
dimension dim(M4*^). This follows fr om the message-passing update rule ((8]) together with the remark 
that, given any fixed subspace and a uniformly random full-rank mxm matrix L, LW* is a uniformly 
random subspace with the same dimension as . 

We prove that dim(Wj*'') is distributed as D^*) by recursion. The statement is true for t = by 
definition of our channel model. Consider the tree T(t + 1) and condition on the offspring number at 
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the root n — 1. Denote by W^*^, . . . , Wjl^_^ the corresponding messages towards the root and condition 
on dim (W4*|) = di,. . . , dim(w4*^_J = d„_i. Then the distribution of dim(M4*+^^) is given by the 
kernel (llOp with D ~ £lu hy uniformity of the subspace. The claim follows from the fact that W^*\,. . . , 
1 are iid because of the tree structure. □ 

In the proof of Lemma 14.21 we require an estimate of the probability that true density evolution 
deviates significantly from the the rescaled density evolution. 

Proposition 4.4 (Deviations from Asymptotic Density Evolution). Let Vi be a subspace of dimension 
di in F™, and V2 a uniformly random subspace of dimension c?2. Define di ^2 = max(0, di + d2 — m), 
and di ffl (i2 = min(TO, di + rf2)- Then 

f{di Gd2< dim {Vi nV2) <di&d2 + k}>l- ^-fe-max(o,™-<ii-d2) ^ (18) 
P{di ffl da - fc < dim {Vi + V2) < di ffl ^2} > 1 - ^-^-■"^'^(o.™-'^!-'*^) (19) 

Further, let V be a subspace of dimension d and let Vi,...,Ki_i be uniformly random subspaces of 
dimensions (respectively) di, . . . , c?„_i and d = [di + ■ ■ ■ + dn-i + d — ™]o- Then 

f{\diin{{Vi + --- + Vn-i)r\V)-d\ >k}<nq~''/"'. (20) 

Proof Notice that Eq. ^9ij follows from Eq. ((T8|) together with the identity dim (Vi + V2) = di + ^2 - 
dim [Vi n V2). Further dim (Vi n V2) > di Q d2 for any two subspaces Vi, V2 of the given dimensions. 

We are left with the task of bounding the probability of dim {Vi H V2) > di Q d2 + k. Notice that 
this event is identical to \Vi Pi V2I > qd.iQd2+k denote by \S\ the cardinality of the set S). By the 
Markov inequality we have 

P{dim(V^i n V2) >diQd2 + k]< q-^-'^^'^'%Vi n V2\ - q-^-'^iQ'^^ ^di+d^-™^ (21) 

where the equality on the right-hand side follows by multiplying the number of vectors in Vi (that is 
q'^^) with the probability that one of them belongs to V2 (by uniformity this is q~'"+''^). 

Eq. (|20p follows by applying the previous bound recursively, bounds. Explicitly, we define Wi — Vi, 
W2 — W1 + V2, ■ ■ ■ , Wn-i = Wn-2 + Vn~i, and Wn — Wn-iCiV. The corresponding (typical) dimensions 
are ci = di, C2 = ci ffl c?2, • ■ ■ , c„_i = c„_2 ffl dn-i, c„ — c„_i Q dn = d. By the union bound, with 
probability at least 1 — nq~'^/" we have |dim(W„) — (d„ 0dim(VF„_i))| < k/n and |dim(Wi) - {d^ ffl 
dim (Wi_i)| < k/n for i E {2, . . . , n — 1}. The thesis follows by the triangle inequality. □ 

Proof of Lemma \4-S\ We will first prove that there exists a coupling between D^*^ and such that 
jZ?'*-' — (£a;)^'-*^| < £e with high probability as ^, m ^ cxo (with A, uj fixed). Subsequently, we shall prove 
that this claim implies the thesis. 

The coupling is constructed recursively. For t = we have D*^"^ = {£uj)^^*'> — £lo deterministically. 
This defines the coupling of D'^*^ and for t — Q. Assume we have shown how to construct a coupling 
of £iW and^^*) for some t G N. To define the coupling for i+ 1 we draw an integer n with distribution pn- 
We then generate n — 1 coupled pairs {Df^'^\s!f^^^)- From those we generate a coupled pair {Df\s^f^) 
via the recursions (fTT|) and respectively. 

In order to prove the claim it is sufficient to show the following. If Vi,...,V„_i are uniformly 
random subspaces of dimensions (£w)^i, . . . , {£uj)^n_i in F™, and if V has dimension (^w), then, with 
high probability, |dim ((Vi + • • • + Ki-i) H y) — (^ti.')CI < for any e > 0. This in turns follows from 
Proposition 14.41 (Eq. (|20p ) together with the observation that the degree n is bounded. 

Let us now consider the thesis of the lemma, Eq. . We can assume without loss of generality that 
J^max > 1 and m > £lu, whence 1 — A > Xcu follows. Let ri„iax > 2 be the largest integer in the support 
of Pn and take £ > small enough so that 2(nmax — 1)^ < (1 — ^)/{^^) — 1 — 7 for some 7 > 0. Draw 
JT-max iid copies of denoted D^/\ . . . Since under the coupling \D^*'> - (^cj)^(*)| < £s with 

high probability, 

P {max{i^f \ . . . , D^^lj > 2si£Lu)} < n,„axP{^^*^ > e} + o„(l) . (22) 
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Now draw n with distribution p„ and conditional on . . . , D^}_-^ according to the kernel pO)) . 

Namely, is the dimension of F n (Vi + • • • + when dim {V) — £uj and Vi, . . . , Vn-\ are 

uniformly random subspaces of F™ with dimensions -Df\ . . . , fi'li- 

Let = Vi + ■ • ■ + Vn-i- Then is uniformly random conditioned on its dimension dim (W) < 
D^'^ + • ■ • + i^i^li < ^(1 — A — Xuj)/X — £uj^ with probability lower bounded as in Eq. ((^^ . Assume this 
to be the case. By Proposition 14. 4i Eq. p8|) . and recalling that m = £(1 — A)/A, the probability that 
D(t+i) = dim {V nW) > is at most q^'^T'^. This proves the thesis. □ 

In order to prove our last auxiliary result, Lemma 14.31 we need some algebraic properties of the 
edge-perspective capacity-achieving degree distribution and of the corresponding generating function: 

oc , oo 

^^(-)- E a-m-2) -'-'^T.ph-'-'- (23) 

i=k+l ^ ' j=0 

Lemma 4.5 (Basic Properties of Capacity-Achieving Degree Distribution). Let k £ H and define 
hA'^) = E;=IC^')«^(1 - ay-^-'- Then = 1, dp* (x)/da; | > k, fl pl{x)dx = l/{2k), 

and Pl,tfkAa) = a. 



Proof. By a reordering of the terms in the sum, 

,,,, V ^ k-1 ^. ^ fk-1 k-l\ / fc-1 

pt(l) = Imi > = hm > = hm 1 

^''^ ' j^oo {i-l){i-2) J-oo.^ V*-2 k + j-l 

In a similar manner, we have 

1 ,^ r r ^ fc-1 ^ 1 

Oi. X da; = lim > — — hm > — — 



2k 



The claim p^(a;)da; = l/(2fc) follows since pl.{x) = 1, j > 0, and since p{x) only contains powers of 
X of at least k. In order to prove the last assertion we recall the identity [15] 

n—i ^ ^ ^ 

We theix obtain (here a = (1 — a)): 

, i—l/. ^\ oo oo X\-z— X 

=(^-i)E(-y4-^=(^-i)ET^-«' 

^I^VaV j(j-l) ,t^j(j-l) 
where we applied the identity obtained by integrating Eq. p4p twice with respect to x. □ 



Froo/ of Lemma\4^ Let fc = (1 - A)/(Aw), fc e N. Then C(A, tj)/((l - tj)(l - A)) = 1 - l/k. 

It is clear from the recursive definition (fT3|) together with the initial condition ^'"^ = 1 that, for any 
t>0, only takes values and 1. Let at = P{^(*) = 1}. Then ao = 1, and Eq. ^ implies that 

OO 

"t+l = ^ Pnfk,n{at) = ^k,p{at), (25) 
n=fc+l 

where fk,n{<y) is defined as in the statement of Lemma H31 (note that fk,k{ci) = 0). We claim that for 
any r < 1 — 1/fc there exists an edge-perspective degree distribution p of bounded support such that: (i) 
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1 — 2 J^pix) dx > r; (ii) Fk^p{a) < a for any a G (0, 1]; (iii) Fk^p{a) — 0{a'') as a | 0. Then the lemma 
follows by standard calculus, with 7 G (1, k) and A sufficiently small. 

In order to exhibit such a degree distribution, fix & G N, 6 > fc, and define p{x) — J2i=k Pi^^^^^ where 
Pi = except for p^ = p^ ^, i = k + 1, . . . ,b, and pk = I - Yl\=k+i P*k.i- Then 

„1 b b 00 

/ p{x)dx ^'Ypi/i^'Ypl Ji+ "Y pl Jk. (26) 

i=k i=k i=b+l 

By Lemma |4?5] the right-hand side converges tol/(2fc)as6^cxo. Therefore we can chose b large enough 
so that claim (i) above is fulfilled. 
Consider now claim (ii). We write 

00 00 00 

Ffc,p(a) = X! P^fkAo:) ^ X! P*k,ifkA'^) - X! Pk,ifk.t{a) ^ a - ^ Pk,ifkA<^)^ 

i=k+l i=k+l i=b+l i=b+l 

where the last identity follows from Lemma 14.51 The claim is implied by the remark that fk.i{(x) > 
for ? > fc + 1 and a G (0, 1]. 

Finally, claim (iii) is a consequence of the fact that fk,i{a) — i^^^)'^'' + 0{a''^'^) together with 
i <b. □ 

4.3 Capacity 

Proof of Proposition \3.1[ By standard information-theoretic arguments [3] , the channel information ca- 
pacity is given by 

C{u;, A) = lim sup I{X; Y) . (27) 

Here I{2LtX.) = J2x y P2f,ll(£i u) log{IPx,y (x, y)/IPx(£)Py (j/)} is the mutual information between 2L and 
Y_ and the supremum is taken over all possible input distributions. 

Writing the mutual information in terms of entropy and conditional entropy, and using our channel 
model ©, we have I{X;Y) = H{Y) - H{Y\2C) = H{Y) - H{Z). Since H{Z) does not depend on the 
input distribution, the mutual information is maximized when the latter is uniform. This implies that 
the output is uniform as well, and we get H{Y_) = log((j'"^). 

Finally, H{Z_) is the logarithm of the number A{s, £,m) of £ x m matrices of rank rank(Z) — £uj = s. 
We have A{s,£,m) = (;'"^Po{rank(Z) = s} where Pq denotes probability with respect to a uniformly 
random matrix Z_. Assume without loss of generality that £,m > s. If zi, . . . , be the lines of Z_, then 
the first s lines are independent with probability (1 — q~^){l — q^^^^) • • • (1 — q"^"*"") > 1 — sq~^~^^. then 
the space . 

A{s, £, to) > g"^ P{zs+i . . . Z£ G (zi . . . z^), rank(0i . . . z^) = s} > q"''^ q-ie-s){,n-s) _ gg-i+s^ _ (23) 

On the other hand Po{rank(_Z) = s} is upper bounded by summing over all subsets of s lines (there are 
(f) < 2^ such subsets), the probability that such lines are independent and that the other lines are in 
the span generated by these. Such an upper bound is at most 2^ larger than the above lower bound. By 
taking ^ 00 with ^ = A = iV - to, A G (0, 1) and uj G (0, min(l, (1 - A)/A we get 

H{Z) = log A{s, £, to) N£{lj + uj^X) + 0{N) . (29) 

Therefore I{X; Y) = H{Y_) - H{Z) = N£{l-\-uj + uj^X) + 0{N) whence the thesis follows. □ 
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